12 research outputs found

    Large Language Models Encode Clinical Knowledge

    Full text link
    Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications

    Towards Generalist Biomedical AI

    Full text link
    Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems

    Characterizing the ageing of GCL through electrical resistivity

    No full text
    In closed hazardous waste landfills, impermeable layered covers mainly composed by clays, Geosynthetic Clay Liner (GCL) or geomembrane etc are used to seal in the waste to minimize water infiltration and accumulation of leachate inside the waste. An experimental site of landfill cap was realized with sodium-activated calcium bentonite GCL at a depth of 0.45 m covered by gravels and top soil. The monitoring of this site was performed during 32 months with measurements of weather conditions, electrical resistivity tomography (ERT) and geotechnical measurements at the end of the monitoring. Two different methods underlined that the GCL’s electrical resistivity decreased after 22 months subsequent to its installation morever it was possible to detect the defects that have been made in the GCL prior to closure to simulate factors affecting GCL performance. Thereby the analyses made on the GCL, sampled at two locations in the vicinity of the ERT profile, highlighted changes in the intrinsic properties of the material. Changes in the proportion of sodium and calcium cations occurred and its hydraulic conductivity increased from 5×10-11 to 3×10-6 m.s-1. Thus, this study shows that electrical resistivity is able to characterize the ageing of a GCL. Résumé Sur les installations de stockage de déchets dangereux, des couvertures imperméables composées d’argile parfois accompagnées de GSB (géocomposite synthétique benthonique contenant de la bentonite calcique activée au sodium) ou de géomembrane, sont mises en place pour isoler les déchets des infiltrations d’eau et diminuer la quantité de lixiviat. Un site expérimental contenant un GSB à 0.45 m de profondeur, surmonté par des graviers et de la terre végétale, a été mis en place. Ce site équipé d’une station météorologique et de mesure d’humidité a fait l’objet d’un suivi temporel de 32 mois, et d’une caractérisation par tomographie de résistivité électrique et géotechnique en fin de suivi. Les mesures géophysiques, montrent que la résistivité électrique du GSB décroit fortement 22 mois après sa mise en place ; de plus, des défauts mécaniques créés à travers le GSB au moment de la construction deviennent détectables. Les analyses réalisées sur des prélèvements d’échantillon de GSB, démontrent, quant à elles, un changement des propriétés du matériau. Ainsi les proportions de cations calciques et sodiques diffèrent de celles du GSB initial témoignant d’échanges cationiques et s’accompagnent d’une augmentation de la perméabilité hydraulique, passant de 5×10-11 à 3×10-6 m.s-1. Nous montrons ainsi que la résistivité électrique permet de mettre en évidence le vieillissement du GSB.The accepted manuscript in pdf format is listed with the files at the bottom of this page. The presentation of the authors' names and (or) special characters in the title of the manuscript may differ slightly between what is listed on this page and what is listed in the pdf file of the accepted manuscript; that in the pdf file of the accepted manuscript is what was submitted by the author

    Parameter estimation approach to banding artifact reduction in balanced steady-state free precession

    No full text
    Purpose: The balanced steady-state free precession (bSSFP) pulse sequence has shown to be of great interest due to its high signal-to-noise ratio efficiency. However, bSSFP images often suffer from banding artifacts due to off-resonance effects, which we aim to minimize in this paper. Methods: We present a general and fast two-step algorithm for 1) estimating the unknowns in the bSSFP signal model from multiple phase-cycled acquisitions, and 2) reconstructing band-free images. The first step, Linearization for Off-Resonance Estimation (LORE), solves the nonlinear problem approximately by a robust linear approach. The second step applies a Gauss-Newton algorithm, initialized by LORE, to minimize the nonlinear least squares criterion. We name the full algorithm LORE-GN. Results: We derive the Cramér-Rao bound (CRB), a theoretical lower bound of the variance for any unbiased estimator, and show that LORE-GN is statistically efficient. Furthermore, we show that simultaneous estimation of T1 and T2 from phase-cycled bSSFP is difficult, since the CRB is high at common SNR. Using simulated, phantom, and in vivo data, we illustrate the band-reduction capabilities of LORE-GN compared to other techniques, such as sum-of-squares. Conclusion: Using LORE-GN we can successfully minimize banding artifacts in bSSFP

    Borano-nucleotides: new analogues to circumvent HIV-1 RT-mediated nucleoside drug-resistance

    No full text
    International audienceAlpha-boranophosphates suppress RT-mediated resistance when the catalytic rate of incorporation (kpol) of the analogue 5'-triphosphate is responsable for drug resistance, such as in the case of K65R mutant and ddNTPs, and Q151M toward AZTTP and ddNTPs. This suppression is also observed with BH3-d4T and BH3-3TC toward their clinically relevant mutants Q151M and M184V. Moreover, the presence of the borano (BH3-) group renders the incorporation of the analogue independent from amino-acid substitutions in RT. To our knowledge, this is the first example of rescue of polymerase activity by means of a nucleotide analogu
    corecore